H(p,q)=Ep[−logq]=H(p)+DKL(p∥q),
For discrete
p and
q this means:
H(p,q)=−∑xp(x)logq(x).
Logistic loss in the logistic regression is sometimes called cross-entropy loss, which measures the similarity between the prection and actual data labels:
L(w) = 1N∑n=1NH(pn,qn) = −1N∑n=1N [ynlogŷ n+(1−yn)log(1−ŷ n)],
Because the probability of the data label
yi is 0 or 1 and is fixed, so in the softmax regression, the cross-entropy loss is expressed as:
J(θ)=−⎡⎣⎢⎢∑i=1m∑k=1K1{y(i)=k}logexp(θ(k)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))⎤⎦⎥⎥
reference
https://en.wikipedia.org/wiki/Cross_entropy
http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/